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Abstract. We reminisce and discuss applications of algorithmic prob- 
f^ ' ability to a wide range of problems in artificial intelligence, philosophy 

^Sl , and technological society. We propose that Solomonoff has effectively ax- 

iomatized the field of artificial intelligence, therefore establishing it as a 
r^ ' rigorous scientific discipline. We also relate to our own work in incremen- 

tal machine learning and philosophy of complexity. 
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1 Introduction 



Ray SolomonofF was a pioneer in mathematical Artificial Intelligence ( AI) , whose 
proposal of Algorithmic Probability (ALP) has led to diverse theoretical conse- 
^ I quences and applications, most notably in AI. In this paper, we try to give a 

sense of the significance of his theoretical contributions, reviewing the essence 
of his proposal in an accessible way, and recounting a few, seemingly unrelated, 
^ ■ diverse consequences which, in our opinion, hint towards a philosophically clear 

00 I world-view that has rarely been acknowledged by the greater scientific commu- 

OO . nity. That is to say, we try to give the reader a glimpse of what it is like to 

1^ I consider the consequences of ALP, and what ideas might lie behind the theoret- 

ical model, as we imagine them. 
r"^ ' Let M be a reference machine which corresponds to a universal computei|j 

^-^ ■ with a prefix-free code. In a prefix-free code, no code is a prefix of another. This 

is also called a self-delimiting code, as most reasonable computer programming 
languages are. Solomonoff inquired the probability that an output string x is 
generated by M considering the whole space of possible programs. By giving 
each program bitstring p an a priori probability of 2~IpI, we can ensure that 
5-H ' the space of programs meets the probability axioms (by the extended Kraft 

inequality [5]). In other words, we imagine that we toss a fair coin to generate 
each bit of a random program. This probability model of programs entails the 
following probability mass function (p.m.f.) for strings x G {0, 1}*: 

Pm{x)= J2 2-1^1 (1) 

which is the probability that a random program will output a prefix of x. Pm{x) 
is called the algorithmic probability of x for it assumes the definition of program 
based probability. We use P when M is clear from the context to avoid clutter. 



^ Optionally, it can be probabilistic to deal with general induction problems, i.e.. 
has access to a random number generator [1] Section 4]. 



2 Solomonoff Induction 

Using this probability model of bitstrings, one can make predictions. Intuitively, 
we can state that it is impossible to imagine intelligence in the absence of any 
prediction ability: purely random behavior is decisively non-intelligent. Since, P 
is a universal probability model, it can be used as the basis of universal predic- 
tion, and thus intelligence. Perhaps, SolomonofF's most significant contributions 
were in the field of AI, as he envisioned a machine that can learn anything from 
scratch. Reviewing his early papers such as |3I4) , we see that he has established 
the theoretical justification for machine learning and data mining fields. Few 
researchers could ably make claims about universal intelligence as he did. Unfor- 
tunately, not all of his ideas have reached fruition in practice; yet there is little 
doubt that his approach was the correct basis for a science of intelligence. 

His main proposal for machine learning is inductive inference [5 16) circa 1964, 
for a variety of problems such as sequence prediction, set induction, operator in- 
duction and grammar induction [7]. Without much loss of generality, we can 
discuss sequence prediction on bitstrings. Assume that there is a computable 
p.m.f. of bitstrings Pi. Given a bitstring x drawn from Pi, we can define the 
conditional probability of the next bit simply by normalizing ([1]) [7] . Algorithmi- 
cally, we would have to approximate ([T]) by finding short programs that generate 
X (the shortest of which is the most probable). In more general induction, we 
run all models in parallel, quantifying fit-to-data, weighed by the algorithmic 
probability of the model, to find the best models and construct distributions [7] ; 
the common point being determining good models with high a priori probability. 
Finding the shortest program in general is undecidable, however. Levin search [8] 
can be used for this purpose. There are two important results about Solomonoff 
induction that we shall mention here. First, Solomonoff induction converges very 
rapidly to the real probability distribution. The convergence theorem shows that 
the expected total square error is related only to the algorithmic complexity of 
Pi, which is independent from x. The following bound [9| is discussed at length 
in [To] with a concise proof: 
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<-ilnP(Pi)) 

(2) 

This bound characterizes the divergence of the ALP solution from the real prob- 
ability distribution Pi. P(Pi) is the a priori probability of Pi p.m.f. according 
to our universal distribution Pm ■ On the right hand side of ^, — In Pm (Pi ) 
is roughly kln2 where k is the Kolmogorov complexity of Pi (the length of 
the shortest program that defines it), thus the total expected error is bounded 
by a constant, which guarantees that the error decreases very rapidly as exam- 
ple size increases. Secondly, there is an optimal search algorithm to approximate 
Solomonoff induction, which adopts Levin's universal search method to solve the 
problem of universal induction [8111] . Universal search procedure time-shares all 
candidate programs according to their a priori probability with a clever watch- 
dog policy to avoid the practical impact of the undecidability of the halting 



problem [11^ . The search procedure starts with a tmie hmit t — to, m its it- 
eration tries all candidate programs c with a time limit of t.P{c), and while a 
solution is not found, it doubles the time limit t. The time t{s)/P{s) for a solu- 
tion program s taking time t(s) is called the Conceptual Jump Size (CJS), and 
it is easily shown that Levin Search terminates in at most 2. CJS time. To obtain 
alternative solutions, one may keep running after the first solution is found, as 
there may be more probable solutions that need more time. The optimal solution 
is computable only in the limit, which turns out to be a desirable property of 
Solomonoff induction, as it is complete and uncomputable 112} Section 2] . An ex- 
planation of Levin's universal search procedure and its application to Solomonoff 
induction may be found in |8I11I13| . 

3 The Axiomatization of Artificial Intelligence 

We believe in fact that Solomonoff's work was seminal in that he has single- 
handedly axiomatized AI, discovering the minimal necessary conditions for any 
machine to attain general intelligence (based on our interpretation of |I]). 
Informally, these axioms are: 

AIO AI must have in its possession a universal computer M (Universality). 
All AI must be able to learn any solution expressed in M's code (Learning 

recursive solutions). 
AI2 AI must use probabilistic prediction (Bayes' theorem). 
AI3 AI must embody in its learning a principle of induction (Occam's razor). 

While it may be possible to give a more compact characterization, these are 
ultimately what is necessary for the kind of general learning that Solomonoff 
induction achieves. ALP can be seen as a complete formalization of Occam's 
razor (as well as Epicurus's principle) [M] and thus serve as the foundation 
of universal induction, capable of solving all AI problems of significance. The 
axioms are important because they allow us to assess whether a system is capable 
of general intelligence or not. 

Obviously, All entails AIO, therefore AIO is redundant, and can be omitted 
entirely, however we stated it separately only for historical reasons, as one of the 
landmarks of early AI research, in retrospect, was the invention of the universal 
computer, which goes back to Leibniz's idea of a universal language (character- 
istica universalis) that can express every statement in science and mathematics, 
and has found its perfect embodiment in Turing's research |15ll6j . A related 
achievement of early AI was the development of LISP, a universal computer 
based on lambda calculus (which is a functional model of computation) that has 
shaped much of early AI research. 

See also a recent survey about inductive inference |17j with a focus on Mini- 
mum Message Length (MML) principle introduced in 1968 [IB]. MML principle 
is also a formalization of induction developed within the framework of classi- 
cal information theory, which establishes a trade-off between model complexity 
and fit-to-data by finding the minimal message that encodes both the model 



and the data 19 . This trade-off is quite similar to the earher forms of induc- 
tion that Sofomonoff devefoped, however independently discovered. Dowe points 
out that Occam's razor means choosing the simplest single theory when data is 
equally matched, which MML formalizes perfectly (and is functional otherwise 
in the case of inequal fits) while Solomonoff induction maintains a mixture of 
alternative solutions [T71, Sections 2.4 & 4]. On the other hand, the diversity of 
solutions in ALP is seen as desirable by Solomonoff himself [12^ , and in a recent 
philosophical paper which illustrates how Solomonoff induction dissolves various 
philosophical objections to induction [14] . Nevertheless, it is well worth men- 
tioning that Solomonoff induction (formal theory published in 1964 [SiiBj ). MML 
(1968), and Minimum Description Length [20] formalizations, as well as Statis- 
tical Learning Theory |21j (initially developed in 1960), all provide a principle 
of induction (AI3). However, it was Solomonoff who first observed the impor- 
tance of universality for AI (AIO-AIl). The plurality of probabilistic approaches 
to induction supports the importance of AI3 (as well as hinting that diversity 
of solutions may be useful). AI2, however, does not require much explanation. 
Some objections to Bayesianism are answered using MML in (35]. Please also see 
an intruging paper by Wallace and Dowe 23 on the relation between MML and 
Kolmogorov complexity, which states that Solomonoff induction is tailored to 
prediction rather than inference, and recommends non-universal models in prac- 
tical work, therefore becomes incompatible with the AI axioms (AIO-AIl). Ulti- 
mately, empirical work will illuminate whether our AI axioms should be adopted, 
or more restrictive models are sufficient for universal intelligence; therefore such 
alternative viewpoints must be considered. In addition to this, Dowe discusses 
the relation between inductive inference and intelligence, and the requirements 
of intelligence as we do elsewhere [17l Section 7.3]. Also relevant is an adaptive 
universal intelligence test that aims to measure the intelligence of any AI agent, 
and discusses various definitions of intelligence '24' . 

4 Incremental Machine Learning 

In solving a problem of induction, the aforementioned search methods suffer from 
the huge computational complexity of trying to compress the entire input. For 
instance, if the complexity of the p.m.f. Pi is about 400 bits. Levin search would 
take on the order of 2*"° times the running time of the solution program, which is 
infeasible (quite impossible in the observed universe). Therefore, Solomonoff has 
suggested using an incremental machine learning algorithm, which can re-use 
information found in previous solutions jl3| . 

The following argument illustrates the situation more clearly. Let Pi and P2 
be the p.m.f. 's corresponding to a training sequence of two induction problems 
(any of them, not necessarily sequence prediction, to which others can be reduced 
easily) with data < di,d2 >. Assume that the first problem has been solved 
(correctly) with universal search. It has taken at most 2.CJSi — 2.t{si)/P{si) 
time. If the second problem is solved in an incremental fashion, making use of 
the information from Pi , then the running time of discovering a solution S2 for 



c?2 reduces, depending on the success of information transfer across problems. 
Here, we quantify how much in famihar probabihstic terms. 

In [To) , Solomonoff describes an information theoretic interpretation of ALP, 
which suggests the foUowing entropy function: 

H*{x)^-log^Pix) (3) 

This entropy function has perfect sub-additivity of information according to the 
corresponding conditional entropy definition: 

Pivl^) - ^ (4) 

H*{y\x)^~log2Piy\x) (5) 

H*{x,y)^H*ix)+H*iy\x) (6) 

This definition of entropy thus does not suffer from the additive constant terms 
as in Chaitin's version. We can instantly define mutual entropy: 

H*{x : y) = H*{x) + H*{y) - H*{x, y) = H* [y) - H* {y\x) (7) 

which trivially follows. 

A KUSP machine is a universal computer that can store data and methods 
in additional storage. In 1984, Solomonoff observed that KUSP machines are 
especially suitable for incremental learning TT. In our work [25] we found that, 
the incremental learning approach was indeed useful (as in the preceding OOPS 
algorithm|26)). Here is how we interpreted incremental learning. After each in- 
duction problem, the p.m.f. P is updated, thus for every new problem a new 
probability distribution is obtained. Although we are using the same M reference 
machine for trial programs, we are referring to implicit KUSP machines which 
store information about the experience of the machine so far, in subsequent prob- 
lems. In our example of two induction problems, let the updated P be called P', 
naturally there will be an update procedure which takes time t„(P, si). Just how 
much time can we expect to save if we use incremental learning instead of inde- 
pendent learning? First, let us write the time bound 2.t{s)/P{s) as t{s).2^ («)+!. 
If Si and S2 are not algorithmically independent, then H*{s2\si) is smaller than 
H*{s2). Independently, we would have i(si).2^*(^i)+i-Hi(s2).2^*('*2)+\ together, 
we will have, in the best case i(si).2^*('*i)+i -I- i(s2).2^*('*2l'*i)+i for the search 
time, assuming that recalling si takes no time for the latter search task (which is 
an unrealistic assumption). Therefore in total, the latter search task can acceler- 
ate 2^*(''i^"2) (^jjj^gg^ ^^^ ^g ^^^ g^^g t{s2).2"'^'^'^+^(l-2-"'^'^--'''^'^)-tu{P,si) 
total time in the best case (only an upper bound since we did not account for 
recall time). Note that the maximum temporal gain is related to both how much 
mutual information is discovered across solutions (thus Pi's), and how much 
time the update procedure takes. Clearly, if the update time dominates overall, 
incremental learning is in vain. However, if updates are effective and efficient, 
there is enormous potential in incremental machine learning. 



During the experimental tests of our Stochastic Context Free Grammar based 
search and update algorithms [25 , we have observed that in practice we can re- 
alize fast updates, and we can still achieve actual code re-use and tremendous 
speed-up. Using only 0.5 teraflop/sec of computing speed and a reference ma- 
chine choice of R5RS Scheme [23, we solved 6 simple deterministic operator 
induction problems in 245.1 seconds. This running time is compared to 7150 
seconds without any updates. Scaled to human-level processing speed of 100 ter- 
aflop/sec, our system would learn and solve the entire training sequence in 1.25 
seconds, which is (arguably) better than most human students. In one particu- 
lar operator induction problem (fourth power, x*), we saw actual code re-use: 
(define (pow4 x ) (define (sqr x ) (* x x)) (sqr (sqr x ) )), and an actual 
speedup of 272. The gains that we saw confirmed the incremental learning pro- 
posals of Solomonoff, mentioned in a good number of his publications, but most 
clearly in [1111311] . Based on our work and the huge speedup observed in OOPS 
for a shorter training sequence [251. we have come to believe that incremental 
learning has the epistemological status of an additional AI axiom: 

AI4 AI must be able to use its previous experience to speed up subsequent 
prediction tasks (Transfer Learning). 

This axiom is justified by observing that many universal induction problems 
are completely unsolvable by a system that does not have the adequate sort of 
algorithmic memory, regardless of the search method. 

The results above may be contrasted with inductive programming approaches, 
since we predicted deterministic functions. One of the earliest and most success- 
ful inductive programming systems is ADATE, which is optimized for a more 
specific purpose. ADATE system has yielded impressive results in an ML variant 
by user supplied primitives and constraining candidate programs [28] . Universal 
representations have been investigated in inductive logic programming as well 
[29] , however U-learning unfortunately lacks the extremely accurate generaliza- 
tion of Solomonoff induction. It has been shown that incremental learning is 
useful in the inductive programming framework [30] , which supports our obser- 
vation of the necessity of incremental machine learning. Another relevant work 
is a typed higher-order logic knowledge representation scheme based on term 
representation of individuals and a rich representation language encompassing 
many abstract data types j31| . A recent survey on inductive programming may 
be found in [52] . 

We should also account our brief correspondence with Solomonoff. We ex- 
pressed that the prediction algorithms were powerful but it seemed that mem- 
ory was not used sufficiently. Solomonoff responded by mentioning the potential 
stochastic grammar and genetic programming approaches that he was working 
on at the time. Our present research was motivated by a problem he posed 
during the discussions of his seminars in Turing Days '06 at Bilgi University, 
Istanbul: "We can use grammar induction for updating a stochastic context free 
grammar, but there is a problem. We already know the grammar of the refer- 
ence machine." . We designed our incremental learning algorithms to address this 



particular problerro. Solomonoff has also guided our research by making a valu- 
able suggestion, that it is more important to show whether incremental learning 
works over a sequence of simpler problems than solving a difficult problem. We 
have in addition investigated the use of PPM family of compressors following his 
proposal, but as we expected, they were not sufficient for guiding LISP-like pro- 
grams, and would require too many changes. Therefore, we proceeded directly to 
the simplest kind of guiding p.m.f. that would work for Scheme, as we preferred 
not to work on assembly-like languages for which PPM might be appropriate, 
since, in our opinion, high-level languages embody more technological progress 
(see also [33 which employs a Scheme subset). Colorfully speaking, inventing 
a functional form in assembly might be like re-inventing the wheel. However, in 
general, it would not be trivial for the induction system to invent syntax forms 
that compare favorably to LISP, especially during preliminary training. There- 
fore, much intelligence is already present in a high-level universal computer (AIO) 
which we simply take advantage of. 

5 Cognitive Architecture 

Another important discussion is whether a cognitive architecture is necessary. 
The axiomatic approach was seen counter-productive by some leading researchers 
in the past. However, we think that their opinion can be expressed as follows: 
the minimal program that realizes these axioms is not automatically intelligent, 
because in practice an intelligent system requires a good deal of algorithmic in- 
formation to take off the ground. This is not a bad argument, since obviously, 
the human brain is well equipped genetically. However, we cannot either rule out 
that a somewhat compact system may achieve human-level general intelligence. 
The question therefore, is whether a simply described system like AIXI [34] 
(an extension of Solomonoff induction to reinforcement learning) is sufficient in 
practice, or there is a need for a modular/extensible cognitive architecture that 
has been designed in particular ways to promote certain kinds of mental growth 
and operation. Some proponents of general purpose AI research think that such 
a cognitive architecture is necessary, e.g., OpenCog [3^- Schmidhuber has sug- 
gested the famous Godel Machine which has a mechanical model of machine 
consciousness PB . Solomonoff himself has proposed early on in 2002, the design 
of Alpha, a generic AI architecture which can ultimately solve free-form time- 
limited optimization problems [13,. Although in his later works, Solomonoff has 
not made much mention of Alpha and has instead focused on the particulars 
of the required basic induction and learning capability, nonetheless his proposal 
remains as one of the most extensible and elegant self-improving AI designs. 



^ We occassionally corresponded via e-mail. Before the AGI-10 conference, he had 
reviewed a draft of my paper, and he had commented that the "learning program- 
ming idioms" and "frequent subprogram mining" algorithms were interesting, which 
was all the encouragement I needed. The last e-mail I received from him was on 
ll/Oct/2009. I regretfully learnt that he passed away a month later. His indepen- 
dent character and true scientific spirit will always be a shining beacon for me. 



Therefore, this point is open to debate, though some researchers may want to 
assume another, entirely optional, axiom; 

AI5 AI must be arranged such that self-improvement is feasible in a realistic 
mode of operation (Cognitive Architecture). 

It is doubtful for instance whether a combination of incremental learning and 
AIXI will result in a practical reinforcement learning agent. Neither is it well 
understood whether autonomous systems with built-in utility/goal functions are 
suitable for all practical purposes. We anticipate that such questions will be set- 
tled by experimenters, as the complexity of interesting experiments will quickly 
overtake theoretical analysis. 

We do not consider human-like behavior, or a robotic body, or an autonomous 
AI design, such as a goal-driven or reinforcement-learning agent, essential to 
intelligence, hence we did not propose autonomy or embodiment as an axiom. 
Solomonoff has commented likewise on the preferred target applications |37j : 

To start, I'd like to define the scope of my interest in A.I. I am not 
particularly interested in simulating human behavior. I am interested in 
creating a machine that can work very difhcult problems much better 
and/or faster than humans can - and this machine should be embodied 
in a technology to which Moore's Law applies. I would like it to give a 
better understanding of the relation of quantum mechanics to general 
relativity. I would like it to discover cures for cancer and AIDS. I would 
like it to find some very good high temperature superconductors. I would 
not be disappointed if it were unable to pass itself off as a rock star. 

6 Philosophical Foundation and Consequences 

Solomonoff's AI theory is founded on a wealth of philosophy. Here, we shall 
briefly revisit the philosophical foundation of ALP and point out some of its 
philosophical consequences. In his posthumous publication, Solomonoff mentions 
the inspiration for some of his work: Carnap's idea that the state of the world 
can be represented by a finite bitstring (and that science predicts future bits 
with inductive inference), Turing's universal computer (AIO) as communicated 
by Minsky and McCarthy, and Chomsky's generative grammars [T^]. The dis- 
covery of ALP is described by Solomonoff in quite a bit of detail in [35] , which 
relates his discovery to the background of many prominent thinkers and con- 
tributors. Carnap's empiricism seems to have been a highly influential factor 
in Solomonoff's research as he sought to find how science is carried out, rather 
than particular scientific findings; and ALP is a satisfactory solution to Carnap's 
program of inductive inference [M] . 

Let us then recall some philosophically relevant aspects of ALP discussed 
in the most recent publications of Solomonoff. First, the exact same method is 
used to solve both mathematical and scientific problems. This means that there 
is no fundamental epistemological difference between these problems; our inter- 
pretation is that, this is well founded only when we observe that mathematical 



problems themselves are computational or linguistic problems, in practice math- 
ematical problems can be reduced to particular computational problems, and 
here is why the same method works for both kinds of problems. Mathematical 
facts do not preside over or precede physical facts, they themselves are solutions 
of physical problems ultimately (e.g., does this particular kind of machine halt 
or not?). And the substance of mathematics, the lucid sort of mathematical lan- 
guage and concepts that we have invented, can be fully explained by SolomonofF 
induction, as those are the kinds of useful programs, which have aided an intellect 
in its training, and therefore are retained as linguistic and algorithmic informa- 
tion. The subjectivity and diversity aspects of ALP [12, Sections 3 & 4] fully 
explain why there can be multiple and almost equally productive foundations 
of mathematics, as those merely point out somewhat equally useful formalisms 
invented by different mathematicians. There is absolutely nothing special about 
ZFC theory, it is just a formal theory to explain some useful procedures that 
we perform in our heads, i.e., it is more like the logical explanation of a set 
module in a functional programming language than anything else, however, the 
operations in a mathematician's brain are not visible to their owner, thereby 
leading to useless Platonist fantasies of some mathematicians owing to a dearth 
of philosophical imagination. Therefore, it does not matter much whether one 
prefers this or that formalization of set theory, or category theory as a foun- 
dation, unless that choice restricts success in the solution of future scientific 
problems. Since, such a problematic scientific situation does not seem to have 
emerged yet (forcing us to choose among particular formalizations), the diver- 
sity principle of ALP forces us to retain them all. That is to say, subscribing 
to the ALP viewpoint has the unexpected consequence that we abandon both 
Platonism and Formalism. There is a meaning in formal language, in the manner 
which improves future predictions, however, there is not a single a priori fact, 
in addition to empirical observations, and no such fact is ever needed to con- 
duct empirical work, except a proper realization of axioms A1-A3 (and surely 
no sane scientist would accept that there is a unique and empty set that exists 
in a hidden order of reality). When we consider these axioms, we need to un- 
derstand the universality of computation, and the principled manner in which 
we have to employ it for reliable induction in our scientific inquiries. The only 
physically relevant assumption is that of the computability of the distributions 
which generate our empirical problems (regardless of whether the problem is 
mathematical or scientific), and the choice of a universal computer which intro- 
duces a necessary subjectivity. The computability aspect may be interpreted as 
information finitism, all the problems that we can work with should have finite 
entropy. Yet, this restriction on disorder is not at all limiting, for it is hardly 
conceivable how one may wish to solve a problem of actually infinite complexity. 
Therefore, this is not much of an assumption for scientific inquiry, especially 
given that both quantum mechanics and general relativity can be described in 
computable mathematics (see for instance j39) about the applicability of com- 
putable mathematics to quantum mechanics) . And neither can one hope to find 



an example of a single scientifically valid problem in any textbook of science 
that requires the existence of distributions with infinite complexity to solve. 

With regards to general epistemology, ALP/AIT may be seen as largely in- 
compatible with non-reductionism. Non-reductionism is quite misleading in the 
manner it is usually conveyed. Instead, we must seek to understand irreducibil- 
ity in the sense of AIT, of quantifying algorithmic information, which allows 
us to reconcile the concept of irreducibility with physicalism (which we think 
every empiricist should accept) '40'. In particular, we can partially formalize 
the notion of knowledge by mutual information between the world and a brain. 
Our paper proposed a physical solution to the problem of determining the most 
"objective" universal computer: it is the universe itself. If digital physics were 
true, this might be for instance a particular kind of graph automata, or if quan- 
tum mechanics were the basis, then a universal quantum computer could be 
used; however, for many tasks using such a low-level computer might be ex- 
traordinarily difficult. We also argued that extreme non-reductionism leads to 
arguments from ignorance such as ontological dualism, and information theory 
is much better suited to explaining evolution and the need for abstractions in 
our language. It should also be obvious that the ALP solution to AI extends 
the two main tenets of logical positivism, which are verificationism and unified 
science, as it gives a finite cognitive procedure with which one can conduct all 
empirical work, and allows us to develop a private language with which we can 
describe all of science and mathematics. However, we should also mention that 
this strengthened positivism does not require a strict analytic-synthetic distinc- 
tion; a spectrum of analytic-synthetic distinction as in Quine's philosophy seems 
to be acceptable [H]. We have already seen that according to ALP, mathemat- 
ical and scientific problems have no real distinction, therefore like Quine, ALP 
would allow revising even mathematical logic itself, and we need not remind 
that the concept of universal computer itself has not appeared out of thin air, 
but has been invented due to the laborious mental work of scientists, as they ab- 
stracted from the mechanics oi performing mathematics; at the bottom these are 
all empirical problems |35]. On the other hand, a "web of belief" as in Quine, 
by no means suggests non-reductionism, for that could be true only if indeed 
there were phenomena that had unscathable (infinite) complexity, such as Tur- 
ing oracle machines which were not proposed as physical machines, but only as 
a hypothetical concept [TB]. Quine himself was a physicalist; we do not think 
that he would support the later vendetta against reductionism which may be 
a misunderstanding of his holism. Though, it may be argued that his obscure 
version of Platonism, which does not seem much scientific to us, may be the 
culprit. Today's Bayesian networks seem to be a good formalization of Quine's 
web of belief, and his instrumentalism is consistent with the ALP approach of 
maintaining useful programs. Therefore, on this account, psychology ought to be 
reducible to neurophysiology, as the concept of life to molecular biology, because 
these are all ultimately sets of problems that overlap in the physical world, and 
the relation between them cannot hold an infinite amount of information; which 
would require an infinitely complex local environment, and that does not seem 



consistent with our scientific observations. That is to say, discovery of bridge 
disciphnes is possible as exemplified by quantum chemistry and molecular bi- 
ology, and it is not different from any other kind of empirical work. Recently, 
it has been perhaps better understood in the popular culture that creationism 
and non-reductionism are almost synonymous (regarding the claims of "intelli- 
gent design" that the flagella of bacteria are too complex to have evolved) . Note 
that ALP has no qualms with the statistical behavior of quantum systems, as it 
allows non-determinism. Moreover, the particular kind of irreducibility in AIT 
corresponds to weak emergentism, and most certainly contradicts with strong 
emergentism which implies supernatural events. Please see also pL?;, Section 7] 
for a discussion of philosophical problems related to algorithmic complexity. 

7 Intellectual Property Towards Infinity Point 

Solomonoff has proposed the infinity point hypothesis, also known as the singu- 
larity, as an exponentially accelerating technological progress caused by human- 
level AFs that complement the scientific community, to accelerate our progress 
ad infinitum within a finite, short time (in practice only a finite, but significant 
factor of improvement could be expected) in 1985 [43 (the first paper on the 
subject). Solomonoff has proposed seven milestones of AI development: A: mod- 
ern AI phase (1956 Dartmouth conference), B: general theory of problem solving 
(our interpretation: Solomonoff Induction, Levin Search), C: self-improving AI 
(our interpretation: Alpha architecture, 2002), D: AI that can understand En- 
glish (our interpretation: not realized yet), E: human- level AI, F: an AI at the 
level of entire computer science (CS) community, G: an AI many times smarter 
than the entire CS community. 

A weak condition for the infinity point may be obtained by an economic ar- 
gument, also covered in [33] briefly. The human brain produces 5 teraflops/watt 
roughly. The current incarnation of NVIDIA's General Purpose Graphics Pro- 
gramming Unit architectures called Fermi achieves about 6 gigaflops/watt [44] . 
Assuming 85% improvement in power efficiency per year (as seen in NVIDIA's 
projections), in 12 years, human-level energy efficiency of computing will be 
achieved. After that date, even if mathematical AI fails due to an unforeseen 
problem, we will be able to run our brain simulations faster than us, using less 
energy than humans, effectively creating a bio- information based AI which meets 
the basic requirement of infinity point. For this to occur, whole brain simulation 
projects must be comprehensive in operation and efficient enough |45| . Other- 
wise, human-level AFs that we will construct should match the computational 
efficiency of the human brain. This weaker condition rests on an economic obser- 
vation: the economic incentive of cheaper intellectual work will drive the prolif- 
eration of personal use of brain simulations. According to NVIDIA's projections, 
thus, we can expect the necessary conditions for the infinity point to materialize 
by 2023, after which point technological progress may accelerate very rapidly. 
According to a recent paper by Koomey, the energy efficiency of computing is 
doubling every 1.5 years (about 60% per year), regardless of architecture, which 
would set the date at 2026 \M- 



Assume that we are progressing towards the hypothetical infinity point. 
Then, the entire human civihzation may be viewed as a global intelligence work- 
ing on technological problems. The practical necessity of incremental learning 
suggests that when faced with more difficult problems, better information shar- 
ing is required. If no information sharing is present between researchers (i.e., 
different search programs), then, they will lose time traversing overlapping pro- 
gram subspaces. This is most clearly seen in the case of simultaneous inventions 
when an idea is said to be "up in the air" and is invented by multiple, indepen- 
dent parties on near dates. If intellectual property (IP) laws are too rigid and 
costly, this would entail that there is minimal information sharing, and after 
some point, the global efficiency of solving non-trivial technological problems 
would be severely hampered. Therefore, to utilize the infinity point effects bet- 
ter, knowledge sharing must be encouraged in the society. Maximum efficiency in 
this fashion can be provided by free software licenses, and a reform of the patent 
system. Our view is that no single company or organization can (or should) have 
a monopoly on the knowledge resources to attack problems with truly large algo- 
rithmic complexity (monopoly is mostly illegal presently at any rate). We tend 
to think that sharing science and technology is the most efficient path towards 
the infinity point. Naturally, free software philosophy is not acceptable to much 
commercial enterprise, thus we suggest that as technology advances, the over- 
head of enforcing IP laws are taken into account. If technology starts to advance 
much more rapidly, the duration of the IP protection may be shortened, for in- 
stance, as after the AI milestone F, the bureaucracy and restrictions of IP law 
may be a serious bottleneck. 

8 Conclusion 

We have mentioned diverse consequences of ALP in axiomatization of AI, phi- 
losophy, and technological society. We have also related our own research to 
Solomonoff's proposals. We interpret ALP and AIT as a fundamentally new 
world-view which allows us to bridge the gap between complex natural phe- 
nomena and positive sciences more closely than ever. This paradigm shift has 
resulted in various breakthrough applications and is likely to benefit the society 
in the foreseeable future. 
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